AITopics | metal cube

Collaborating Authors

metal cube

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compose Visual Relations

Neural Information Processing SystemsFeb-11-2026, 01:12:18 GMT

A large brown metal cube belowa large green rubber cylinder A large gray metal sphereabove a small red metal cube A small red metal cube behinda large brown metal cube A large brown metal cube below a large green rubber cylinder A large gray metal sphereabove a small red metal cube A small red metal cube on the left of a large brown metal cube A large brown metal cube below a large green rubber cylinder A blue objectinfrontofa gray object! A gray object on the left ofa green object A green object behindablue object! A blue objectin front ofa gray object! A gray object behind a green object! A green object on the left ofa blue object! A blue object behind a gray object A gray object on the left ofa green object A green object on the right ofa gray object CLIPQuery imageFine-tuned CLIPOurs( a) Top 1 image-text retrieval result on i Gibsonscenes.(

artificial intelligence, arxivpreprintarxiv, machine learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Michigan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Learning to Reason with Mixture of Tokens

Jain, Adit, Rappazzo, Brendan

arXiv.org Artificial IntelligenceSep-29-2025

Reinforcement learning with verifiable rewards (RLVR) has become a leading approach for improving large language model (LLM) reasoning capabilities. Most current methods follow variants of Group Relative Policy Optimization, which samples multiple reasoning completions, scores them relative to each other, and adjusts the policy accordingly. However, these approaches invariably sample discrete tokens at each reasoning step, discarding the rich distributional information in the model's probability distribution over candidate tokens. While preserving and utilizing this distributional information has proven beneficial in non-RL settings, current RLVR methods seem to be unnecessarily constraining the reasoning search space by not using this information. To address this limitation, we investigate mixture-of-token generation (MoT-G) in RLVR. We present a unified framework that generalizes existing MoT-G approaches, including existing training-free methods that construct mixture embeddings as weighted sums over token embeddings, and extend RLVR to operate directly in this continuous mixture space for generating chain-of-thought. Evaluating two MoT-G variants on Reasoning-Gym, a suite of reasoning-intensive language tasks, we find that MoT--G methods achieve substantial improvements (5--35 \% gains on 7 out of 10 tasks) compared to standard decoding with the Qwen2.5-1.5B model, while reaching comparable accuracy with half the number of trajectories, suggesting improved training efficiency. Through comprehensive hidden-state and token-level analyses, we provide evidence that MoT--G's benefits may stem from its ability to maintain higher hidden-state entropy throughout the reasoning process and promote exploration in token space.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.21482

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Learning to Compose Visual Relations

Neural Information Processing SystemsAug-17-2025, 06:01:32 GMT

An image of a room may be conjured given only the description of the underlying objects and their associated relations.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Michigan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.97)
(3 more...)

Add feedback

LLMs Can Plan Only If We Tell Them

Sel, Bilgehan, Jia, Ruoxi, Jin, Ming

arXiv.org Artificial IntelligenceJan-23-2025

Large language models (LLMs) have demonstrated significant capabilities in natural language processing and reasoning, yet their effectiveness in autonomous planning has been under debate. While existing studies have utilized LLMs with external feedback mechanisms or in controlled environments for planning, these approaches often involve substantial computational and development resources due to the requirement for careful design and iterative backprompting. Moreover, even the most advanced LLMs like GPT-4 struggle to match human performance on standard planning benchmarks, such as the Blocksworld, without additional support. This paper investigates whether LLMs can independently generate long-horizon plans that rival human baselines. Our novel enhancements to Algorithm-of-Thoughts (AoT), which we dub AoT+, help achieve state-of-the-art results in planning benchmarks out-competing prior methods and human baselines all autonomously.

cylinder, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.13545

Country:

North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

MEWL: Few-shot multimodal word learning with referential uncertainty

Jiang, Guangyuan, Xu, Manjie, Xin, Shiji, Liang, Wei, Peng, Yujia, Zhang, Chi, Zhu, Yixin

arXiv.org Artificial IntelligenceJun-1-2023

Without explicit feedback, humans can rapidly learn the meaning of words. Children can acquire a new word after just a few passive exposures, a process known as fast mapping. This word learning capability is believed to be the most fundamental building block of multimodal understanding and reasoning. Despite recent advancements in multimodal learning, a systematic and rigorous evaluation is still missing for human-like word learning in machines. To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes. MEWL covers human's core cognitive toolkits in word learning: cross-situational reasoning, bootstrapping, and pragmatic learning. Specifically, MEWL is a few-shot benchmark suite consisting of nine tasks for probing various word learning capabilities. These tasks are carefully designed to be aligned with the children's core abilities in word learning and echo the theories in the developmental literature. By evaluating multimodal and unimodal agents' performance with a comparative analysis of human performance, we notice a sharp divergence in human and machine word learning. We further discuss these differences between humans and machines and call for human-like few-shot word learning in machines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.00503

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Zhang, Yan, Zhang, David W., Lacoste-Julien, Simon, Burghouts, Gertjan J., Snoek, Cees G. M.

arXiv.org Machine LearningNov-23-2021

Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equivariance and improve it with approximate implicit differentiation, allowing for better optimization while being faster and saving memory. In a range of toy experiments, we show that the perspective of multiset-equivariance is beneficial and that our changes to DSPN achieve better results in most cases. On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation.

metal cylinder, rubber cube, rubber cylinder, (14 more...)

arXiv.org Machine Learning

2111.12193

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Learning to Compose Visual Relations

Liu, Nan, Li, Shuang, Du, Yilun, Tenenbaum, Joshua B., Torralba, Antonio

arXiv.org Artificial IntelligenceNov-17-2021

The visual world around us can be described as a structured set of objects and their associated relations. An image of a room may be conjured given only the description of the underlying objects and their associated relations. While there has been significant work on designing deep neural networks which may compose individual objects together, less work has been done on composing the individual relations between objects. A principal difficulty is that while the placement of objects is mutually independent, their relations are entangled and dependent on each other. To circumvent this issue, existing works primarily compose relations by utilizing a holistic encoder, in the form of text or graphs. In this work, we instead propose to represent each relation as an unnormalized density (an energy-based model), enabling us to compose separate relations in a factorized manner. We show that such a factorized decomposition allows the model to both generate and edit scenes that have multiple sets of relations more faithfully. We further show that decomposition enables our model to effectively understand the underlying relational scene structure.

cube, relation, relational scene description, (15 more...)

arXiv.org Artificial Intelligence

2111.09297

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Michigan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Deep Set Prediction Networks

Zhang, Yan, Hare, Jonathon, Prügel-Bennett, Adam

arXiv.org Machine LearningJun-15-2019

We study the problem of predicting a set from a feature vector with a deep neural network. Existing approaches ignore the set structure of the problem and suffer from discontinuity issues as a result. We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem. With a single feature vector as input, we show that our model is able to auto-encode point sets, predict bounding boxes of the set of objects in an image, and predict the attributes of these objects in an image.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1906.06565

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

This advanced neural network can explain its thought process (finally)

#artificialintelligenceSep-15-2018, 18:51:27 GMT

The type of artificial intelligence known as a neural network can be trained to complete tasks once thought to be exclusive to humans, such as driving a car, creating visual art, or composing a heavy metal album. But neural networks have a big problem: they're really complicated. They're so complex, in fact, that researchers have often struggled to explain precisely why they make specific decisions. Now, researchers at the Massachusetts Institute of Technology (MIT) say they've created a neural network that can explain the steps it took to solve a problem -- an advance that could help us better understand how the technology works and alleviate safety concerns in riskier applications, like self-driving cars. The new algorithm, called the Transparency by Design Network (TbD-net), breaks down the process of recognizing an image into subtasks.

artificial intelligence, machine learning, neural network, (4 more...)

#artificialintelligence

Country: North America > United States > Massachusetts (0.27)

Industry:

Transportation (0.62)
Information Technology (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback